Lesson 4. More Data, More Maps!

Now that we know how to pull in data, check and transform Coordinate Reference Systems (CRS), and plot sf data.frames together - let’s practice doing the same thing with other geometry types. In this notebook we’ll be bringing in bike boulevards and schools, which will get us primed to think about spatial relationship queries.


Instructor Notes

Import Libraries

library(sf)
## Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1
library(tmap)

4.1 Berkeley Bike Boulevards

We’re going to bring in data bike boulevards in Berkeley. Note two things that are different from our previous data:

  • We’re bringing in a GeoJSON this time and not a shapefile

  • We have a line geometry GeoDataFrame (our county and states data had polygon geometries)

bike_blvds = st_read('notebook_data/transportation/BerkeleyBikeBlvds.geojson')
## Reading layer `BerkeleyBikeBlvds' from data source `/Users/hikarimurayama/Documents/repos/Geospatial-Fundamentals-in-R-with-sf/notebook_data/transportation/BerkeleyBikeBlvds.geojson' using driver `GeoJSON'
## Simple feature collection with 211 features and 10 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 561541.2 ymin: 4189007 xmax: 566451.6 ymax: 4193483
## projected CRS:  WGS 84 / UTM zone 10N
plot(bike_blvds$geometry)

Of course, we could also use tmap to plot our lines:

# set to view mode
tmap_mode('view')
## tmap mode set to interactive viewing
tm_shape(bike_blvds) +
  tm_lines()

As usual, we’ll want to do our usual data exploration…

head(bike_blvds)
## Simple feature collection with 6 features and 10 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 562293.8 ymin: 4189795 xmax: 562786.5 ymax: 4189899
## projected CRS:  WGS 84 / UTM zone 10N
##       BB_STRNAM BB_STRID    BB_FRO     BB_TO BB_SECID DIR_   Status ALT_bikeCA
## 1 Heinz/Russell      RUS       7th       8th    RUS01  E/W Existing         No
## 2 Heinz/Russell      RUS       8th       9th    RUS02  E/W Ezisting         No
## 3 Heinz/Russell      RUS       9th      10th    RUS03  E/W Existing         No
## 4 Heinz/Russell      RUS      10th San Pablo    RUS04  E/W Existing         No
## 5     San Pablo      RUS     Heinz   Russell    RUS05  N/S Existing         No
## 6       Russell      RUS San Pablo   Wallace    RUS06  E/W Exisitng          3
##   Shape_len len_km                       geometry
## 1 101.12817  0.101 MULTILINESTRING ((562293.8 ...
## 2 100.81407  0.101 MULTILINESTRING ((562391.6 ...
## 3 100.03740  0.100 MULTILINESTRING ((562489 41...
## 4 106.59288  0.107 MULTILINESTRING ((562585.7 ...
## 5  89.56348  0.090 MULTILINESTRING ((562688.9 ...
## 6  76.95699  0.077 MULTILINESTRING ((562711.4 ...
dim(bike_blvds)
## [1] 211  11
colnames(bike_blvds)
##  [1] "BB_STRNAM"  "BB_STRID"   "BB_FRO"     "BB_TO"      "BB_SECID"  
##  [6] "DIR_"       "Status"     "ALT_bikeCA" "Shape_len"  "len_km"    
## [11] "geometry"

Our bike boulevard data includes the following information:

  • BB_STRNAM - bike boulevard Streetname
  • BB_STRID - bike boulevard Street ID
  • BB_FRO - bike boulevard origin street
  • BB_TO - bike boulevard end street
  • BB_SECID- bike boulevard section id
  • DIR_ - cardinal directions the bike boulevard runs
  • Status - status on whether the bike boulevard exists
  • ALT_bikeCA - ?
  • Shape_len - length of the boulevard in meters
  • len_km - length of the boulevard in kilometers
  • geometry

Question

Why are there 211 features when we only have 8 bike boulevards?

And now take a look at our CRS…

st_crs(bike_blvds)
## Coordinate Reference System:
##   User input: WGS 84 / UTM zone 10N 
##   wkt:
## PROJCRS["WGS 84 / UTM zone 10N",
##     BASEGEOGCRS["WGS 84",
##         DATUM["World Geodetic System 1984",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4326]],
##     CONVERSION["UTM zone 10N",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-123,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["World - N hemisphere - 126°W to 120°W - by country"],
##         BBOX[0,-126,84,-120]],
##     ID["EPSG",32610]]

Let’s tranform our CRS to UTM Zone 10N, NAD83 that we used in the last lesson.

bike_blvds_utm10 = st_transform(bike_blvds, crs = 26910)
head(bike_blvds_utm10)
## Simple feature collection with 6 features and 10 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 562293.8 ymin: 4189795 xmax: 562786.5 ymax: 4189899
## projected CRS:  NAD83 / UTM zone 10N
##       BB_STRNAM BB_STRID    BB_FRO     BB_TO BB_SECID DIR_   Status ALT_bikeCA
## 1 Heinz/Russell      RUS       7th       8th    RUS01  E/W Existing         No
## 2 Heinz/Russell      RUS       8th       9th    RUS02  E/W Ezisting         No
## 3 Heinz/Russell      RUS       9th      10th    RUS03  E/W Existing         No
## 4 Heinz/Russell      RUS      10th San Pablo    RUS04  E/W Existing         No
## 5     San Pablo      RUS     Heinz   Russell    RUS05  N/S Existing         No
## 6       Russell      RUS San Pablo   Wallace    RUS06  E/W Exisitng          3
##   Shape_len len_km                       geometry
## 1 101.12817  0.101 MULTILINESTRING ((562293.8 ...
## 2 100.81407  0.101 MULTILINESTRING ((562391.6 ...
## 3 100.03740  0.100 MULTILINESTRING ((562489 41...
## 4 106.59288  0.107 MULTILINESTRING ((562585.7 ...
## 5  89.56348  0.090 MULTILINESTRING ((562688.9 ...
## 6  76.95699  0.077 MULTILINESTRING ((562711.4 ...

4.2 Alameda County Schools

Alright! Now that we have our bike boulevard data squared away, we’re going to bring in our Alameda County school data.

schools_df = read.csv('notebook_data/alco_schools.csv')
head(schools_df)
##           X        Y                      Site               Address    City
## 1 -122.2388 37.74476 Amelia Earhart Elementary 400 Packet Landing Rd Alameda
## 2 -122.2519 37.73900       Bay Farm Elementary   200 Aughinbaugh Way Alameda
## 3 -122.2589 37.76206  Donald D. Lum Elementary    1801 Sandcreek Way Alameda
## 4 -122.2348 37.76525         Edison Elementary  2700 Buena Vista Ave Alameda
## 5 -122.2381 37.75396     Frank Otis Elementary      3010 Fillmore St Alameda
## 6 -122.2616 37.76911       Franklin Elementary  1433 San Antonio Ave Alameda
##   State Type API    Org
## 1    CA   ES 933 Public
## 2    CA   ES 932 Public
## 3    CA   ES 853 Public
## 4    CA   ES 927 Public
## 5    CA   ES 894 Public
## 6    CA   ES 893 Public
dim(schools_df)
## [1] 550   9

Questions Without looking ahead:

  1. Is this a geodataframe?
  2. How do you know?



This is not an sf data.frame! A couple of clues to figure that out are..

  1. We’re pulling in a Comma Separated Value (CSV) file, which is not a geospatial data format
  2. There is no geometry column (although we do have latitude and longitude values)

Although our school data is not starting off as an sf data.frame, we actually have the tools and information to make it one. Using the st_as_sf function, we can coerce our plain data.frame into an sf data.frame (specifying the columns containings the points’ coordinates and the EPSG code of the CRS).

schools_sf = st_as_sf(schools_df,
                       coords = c('X', 'Y'),
                       crs = 4326)
head(schools_sf)
## Simple feature collection with 6 features and 7 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -122.2616 ymin: 37.739 xmax: -122.2348 ymax: 37.76911
## geographic CRS: WGS 84
##                        Site               Address    City State Type API    Org
## 1 Amelia Earhart Elementary 400 Packet Landing Rd Alameda    CA   ES 933 Public
## 2       Bay Farm Elementary   200 Aughinbaugh Way Alameda    CA   ES 932 Public
## 3  Donald D. Lum Elementary    1801 Sandcreek Way Alameda    CA   ES 853 Public
## 4         Edison Elementary  2700 Buena Vista Ave Alameda    CA   ES 927 Public
## 5     Frank Otis Elementary      3010 Fillmore St Alameda    CA   ES 894 Public
## 6       Franklin Elementary  1433 San Antonio Ave Alameda    CA   ES 893 Public
##                     geometry
## 1 POINT (-122.2388 37.74476)
## 2   POINT (-122.2519 37.739)
## 3 POINT (-122.2589 37.76206)
## 4 POINT (-122.2348 37.76525)
## 5 POINT (-122.2381 37.75396)
## 6 POINT (-122.2616 37.76911)
dim(schools_sf)
## [1] 550   8

You’ll notice that the shape is almost the same as what we had as a data.frame, except with one less column (because the two coordinate columns, X, and Y, were consumed into a single geometry column.

Now that it’s an sf data.frame, we can use the fancy plot method for it just as we did for our other data sets. Notice that this is our first point dataset.

plot(schools_sf)

But of course we’ll want to transform the CRS, so that we can later plot it with our bike boulevard data.

schools_utm10 = st_transform(schools_sf, crs = 26910)

And keep in mind that we can always use tmap to plot any of our datasets.

Here’s how we’d use tmap for point data:

tm_shape(schools_utm10) +
  tm_dots(col='green', size=0.2)

In Lesson 2 we discussed that you can save out sf data.frames in multiple file formats. You could opt for a GeoJSON, a shapefile, etc… However, for point data sets it is also an option to save it out as a CSV since each geometry only has a single X and single Y value.

Exercise: Even More Data!

Let’s play around with another points dataset.

In the code cell provided below, compose code to:

  1. Read in the parcel points data (notebook_data/parcels/parcel_pts_rand30pct.geojson)
  2. Set the CRS to be 4326
  3. Transform the CRS to 26910
  4. Use tmap to plot and customize as desired!

To see the solution, look at the hidden text below.

# YOUR CODE HERE:

Solution hidden here!


4.3 Map Overlays with Matplotlib

No matter the geometry type we have for our sf data.frame, we can create overlay plots.

Since we’ve already done the legwork of transforming our CRS, we can go ahead and plot them together.

tm_shape(schools_utm10) + 
  tm_dots(size=0.1) +
tm_shape(bike_blvds_utm10) +
  tm_lines(col='red')

If we want to answer questions like “What schools are close to bike boulevards in Berkeley?”, the above plot isn’t super helpful, since the extent covers all of Alameda county.

Luckily, it is easy for us to crop a sf data.frame, so that we only retain the rows whose geometries are within the bounding box (or extent) of another dataset.

We do this with the st_crop function.

schools_utm10_crop = st_crop(schools_utm10, bike_blvds_utm10)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries

Now what’s see what that last plot looks like using our cropped data.

tm_shape(schools_utm10_crop) + 
  tm_dots(size=0.1) +
tm_shape(bike_blvds_utm10) +
  tm_lines(col='red')

4.4 Recap

In this lesson we learned a several new skills:

  • Transformed an aspatial data.frame into an sf data.frame
  • Worked with point and line datasets
  • Overlayed point and line datasets
  • Cropped one dataset to the extent of another
    • st_crop

Exercise: Overlay Mapping

Let’s take some time to practice reading in and reconciling new datasets, then mapping them together.

In the code cell provided below, write code to:

  1. Bring in your Berkeley places shapefile (and don’t forget to check/transform the crs!) (notebook_data/berkeley/BerkeleyCityLimits.shp)
  2. Overlay the parcel points on top of the bike boulevards
  3. Create the same plot, but limit it to Berkeley by cropping to the extent of Berkeley city limits

BONUS: Add the Berkeley outline to your last plot!

To see the solution, look at the hidden text below.

# YOUR CODE HERE:

Solution hidden here!


4.5 Teaser for Day 2…

You may be wondering if and how we could make our maps more interesting and informative than we have so far.

To give you a tantalizing taste of Day 2, the answer is: Yes, we can! And here’s how (using an approach we hinted at earlier on)!

tm_shape(schools_utm10) + 
  tm_dots(col = 'Org', palette = 'RdYlGn',
          size = 0.15, border.col = 'black',
          title='Public and Private Schools, Alameda County')

 D-Lab @ University of California - Berkeley
 Team Geo